{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Turkish ULMFiT from scratch"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "%reload_ext autoreload\n",
    "%autoreload 2\n",
    "%matplotlib inline\n",
    "\n",
    "from fastai import *\n",
    "from fastai.text import *"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "bs=128\n",
    "torch.cuda.set_device(2)\n",
    "data_path = Config.data_path()\n",
    "\n",
    "lang = 'tr'\n",
    "name = f'{lang}wiki'\n",
    "path = data_path/name\n",
    "path.mkdir(exist_ok=True, parents=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "mdl_path = path/'models'\n",
    "mdl_path.mkdir(exist_ok=True)\n",
    "lm_fns = [mdl_path/f'{lang}_wt', mdl_path/f'{lang}_wt_vocab']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Turkish wikipedia model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "/home/sgugger/.fastai/data/trwiki/trwiki already exists; not downloading\n",
      "<doc id=\"10\" url=\"https://tr.wikipedia.org/wiki?curid=10\" title=\"Cengiz Han\">\r\n",
      "Cengiz Han\r\n",
      "\r\n",
      "Cengiz Han (\"Cenghis Khan\", \"Çinggis Haan\" ya da doğum adıyla Temuçin (anlamı: demirci), Moğolca: \"Чингис Хаан\" ya da \"Tengiz\" (anlamı: deniz), ; d. 1162 – ö. 18 Ağustos 1227), Moğol komutan, hükümdar ve Moğol İmparatorluğu'nun kurucusudur. Cengiz Han, 13. Yüzyılın başında Orta Asya'daki tüm göçebe bozkır kavimlerini birleştirerek bir ulus haline getirdi ve o ulusu \"Moğol\" siyasi kimliği çatısı altında topladı. Dünya tarihinin en büyük askeri dehalarından biri olarak kabul edilen Cengiz Han, hükümdarlığı döneminde 1206-1227 arasında Kuzey Çin'deki Batı Xia ve Jin Hanedanı, Türkistan'daki Kara Hıtay, Maveraünnehir, Harezm, Horasan ve İran'daki Harzemşahlar ile Kafkasya'da Gürcüler, Deşt-i Kıpçak'taki Rus Knezlikleri ve Kıpçaklar ile İdil Bulgarları üzerine gerçekleştirilen seferler sonucunda Pasifik Okyanusu'ndan Hazar Denizi’ne ve Karadeniz'in kuzeyine kadar uzanan bir imparatorluk kurdu.\r\n"
     ]
    }
   ],
   "source": [
    "from nlputils import split_wiki,get_wiki\n",
    "\n",
    "get_wiki(path,lang)\n",
    "!head -n4 {path}/{name}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "/home/sgugger/.fastai/data/trwiki/docs already exists; not splitting\n"
     ]
    }
   ],
   "source": [
    "dest = split_wiki(path,lang)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Turkish is an [Agglutinative_language](https://en.wikipedia.org/wiki/Agglutinative_language) so it needs special care!\n",
    "\n",
    "![Turkish morphemes example](images/turkish.jpg)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "data = (TextList.from_folder(dest, processor=[OpenFileProcessor(), SPProcessor()])\n",
    "        .split_by_rand_pct(0.1, seed=42)\n",
    "        .label_for_lm()\n",
    "        .databunch(bs=bs, num_workers=1))\n",
    "\n",
    "data.save(f'{lang}_databunch')\n",
    "len(data.vocab.itos),len(data.train_ds)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data = load_data(dest, f'{lang}_databunch', bs=bs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>idx</th>\n",
       "      <th>text</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>▁ele ▁geçirerek ▁politik ▁bir ▁özerklik ▁verdiğini ▁belirtmek le ▁birlikte ▁xxmaj ▁tibet ' in ▁yalnızca ▁1913 - 1950 ▁yılları ▁arasında ▁xxmaj ▁çin ' in ▁politik ▁nüfuz undan ▁çıktığını , ▁bölgenin ▁tarihi ▁olarak ▁xxmaj ▁çin ' e ▁ait ▁olduğunu ▁düşünmektedir . ▁xxmaj ▁tibet ' in ▁kendi ▁kültür ▁ve ▁zenginlik lerinin ▁\" kültürel ▁bir ▁soykırım \" a ▁tabi ▁tutulduğu ▁da ▁iddialar ▁arasındadır . ▁xxmaj ▁çin ▁hükümeti ▁ise ▁bu ▁\" kültürel ▁soykırım \" ▁iddia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>▁dışı ▁kalan ▁olmasa ▁da , ▁o ▁yarışta ▁lastik ▁üreticisi ▁xxmaj ▁michel in ' le ▁yaşanan ▁problemler den ▁dolayı ▁güvenlik ▁gerekçesiyle ▁xxmaj ▁michel in ▁lastik leri ▁kullanan ▁tüm ▁takımlar ▁çekilmiş , ▁yarış a ▁sadece ▁6 ▁pilot ▁katılmıştır . ▁xxmaj ▁bunların ▁dışında ▁xxmaj ▁ ant ô ni o ▁xxmaj ▁pi zz onia ▁xxmaj ▁i ̇ talya ▁ve ▁sonrasında ▁sezonun ▁geri ▁kalanında ▁xxmaj ▁nick ▁xxmaj ▁he id feld ' in ▁yerine ▁yarışmıştır .</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>iye ▁oyunlarından ▁çok ▁farklı ▁bulduğunu ▁belirterek ▁taz iye yi ▁de ▁içinde ▁barındıran ▁benzersiz ▁bir ▁yapıt ▁olduğunu ▁yazdı . ▁&lt; ▁/ ▁doc &gt; ▁xxbos ▁xxmaj ▁anahtar sız ▁şifreleme ▁xxmaj ▁anahtar sız ▁şifreleme , ▁anahtar ▁kullanmaya n ▁kriptografik ▁algoritma lar , ▁veya ▁diğer ▁adlarıyla ▁xxmaj ▁veri ▁xxmaj ▁bütünlüğü ▁ve ▁xxmaj ▁özet ▁xxmaj ▁fonksiyonları , ▁veri ▁bütünlüğünü ▁garanti ▁etmek ▁için ▁kullanılan ▁xxup ▁md 5 , ▁xxup ▁sha -1 , ▁xxup ▁ rip em</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>rib ine ▁bağlı ▁olarak ▁akarsu ▁havza larında ▁artan ▁sediman ▁üretimi ▁f lü v yal ▁sistemlerde ▁etkili ▁olmuştur . ▁20 ▁yy ’ ın ▁ikinci ▁yarısından ▁itibaren ▁akarsular ▁üzerinde ▁çok ▁sayıda ▁baraj ▁yapılmıştır . ▁xxmaj ▁baraj ▁yapımı ▁nedeniyle , ▁akarsu nun ▁taşıdığı ▁sediman ▁miktarı ▁çevre nin ▁tahrip ▁edilmesi ▁nedeniyle ▁artmaktadır . ▁xxmaj ▁akarsu ▁yatakları ndan ▁kum ▁alınması , ▁akarsu nun ▁yatağı nın ▁yeniden ▁düzenlemesi ne ▁neden ▁olmaktadır . ▁xxmaj ▁teras lar ;</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>n ür lük \", ▁tek ▁başına ▁sensör ▁çözünürlüğü ne ▁bağlı ▁değildir . ▁\" mak sim um ▁sistem ▁çözünürlüğü \", ▁diğer ▁etkenler ▁( sen s ör ▁üzerinde ▁yer ▁alan ▁filtreler , ▁kullanılan ▁objektif in ▁optik ▁kalitesi ▁ve ▁diyafram ▁açıklığı ▁vb . ) ▁dışarıda ▁tutulur sa ▁aşağıdaki ▁formül ▁ile ▁hesaplanabilir ▁( birim ▁= ▁xxup ▁lp ▁/ ▁xxup ▁mm ▁xxmaj ▁i ̇ ng . : ▁ line ▁pa ir s ▁per ▁mili meter</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "data.show_batch()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "learn = language_model_learner(data, AWD_LSTM, drop_mult=0.1, wd=0.1, pretrained=False).to_fp16()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "lr = 3e-3\n",
    "lr *= bs/48  # Scale learning rate by batch size"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: left;\">\n",
       "      <th>epoch</th>\n",
       "      <th>train_loss</th>\n",
       "      <th>valid_loss</th>\n",
       "      <th>accuracy</th>\n",
       "      <th>time</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>4.337346</td>\n",
       "      <td>4.472464</td>\n",
       "      <td>0.297502</td>\n",
       "      <td>10:04</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>4.145160</td>\n",
       "      <td>4.359449</td>\n",
       "      <td>0.304195</td>\n",
       "      <td>10:29</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>4.199796</td>\n",
       "      <td>4.386935</td>\n",
       "      <td>0.301542</td>\n",
       "      <td>10:29</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>4.132432</td>\n",
       "      <td>4.329662</td>\n",
       "      <td>0.305723</td>\n",
       "      <td>10:30</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>4.048324</td>\n",
       "      <td>4.241048</td>\n",
       "      <td>0.313223</td>\n",
       "      <td>10:32</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>5</td>\n",
       "      <td>3.943493</td>\n",
       "      <td>4.118834</td>\n",
       "      <td>0.323640</td>\n",
       "      <td>10:30</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>6</td>\n",
       "      <td>3.857112</td>\n",
       "      <td>3.969563</td>\n",
       "      <td>0.337475</td>\n",
       "      <td>10:30</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>7</td>\n",
       "      <td>3.695721</td>\n",
       "      <td>3.799551</td>\n",
       "      <td>0.354562</td>\n",
       "      <td>10:30</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>8</td>\n",
       "      <td>3.602813</td>\n",
       "      <td>3.652005</td>\n",
       "      <td>0.371855</td>\n",
       "      <td>10:31</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>9</td>\n",
       "      <td>3.519019</td>\n",
       "      <td>3.596706</td>\n",
       "      <td>0.379076</td>\n",
       "      <td>10:31</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "learn.unfreeze()\n",
    "learn.fit_one_cycle(10, lr, moms=(0.8,0.7))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [],
   "source": [
    "learn.to_fp32().save(lm_fns[0], with_opt=False)\n",
    "learn.data.vocab.save(lm_fns[1].with_suffix('.pkl'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Turkish sentiment analysis"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "https://www.win.tue.nl/~mpechen/projects/smm/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Language model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[PosixPath('/home/sgugger/.fastai/data/trwiki/movies/tr_polarity.pos'),\n",
       " PosixPath('/home/sgugger/.fastai/data/trwiki/movies/tr_clas_databunch'),\n",
       " PosixPath('/home/sgugger/.fastai/data/trwiki/movies/models'),\n",
       " PosixPath('/home/sgugger/.fastai/data/trwiki/movies/tr_polarity.neg'),\n",
       " PosixPath('/home/sgugger/.fastai/data/trwiki/movies/tr_data_lm')]"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "path_clas = path/'movies'\n",
    "path_clas.ls()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>text</th>\n",
       "      <th>pos</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>gerçekten harika bir yapim birçok kez izledim ...</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>her izledigimde hayranlik duydugum gerçek klas...</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>gerçekten tarihi savas filmleri arasinda tarti...</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>aldigi ödülleri sonuna dek hak eden muhtesem b...</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>özgürlük denilince aklima gelen ilk film.bir b...</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                text  pos\n",
       "0  gerçekten harika bir yapim birçok kez izledim ...    1\n",
       "1  her izledigimde hayranlik duydugum gerçek klas...    1\n",
       "2  gerçekten tarihi savas filmleri arasinda tarti...    1\n",
       "3  aldigi ödülleri sonuna dek hak eden muhtesem b...    1\n",
       "4  özgürlük denilince aklima gelen ilk film.bir b...    1"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pos = (path_clas/'tr_polarity.pos').open(encoding='iso-8859-9').readlines()\n",
    "pos_df = pd.DataFrame({'text':pos})\n",
    "pos_df['pos'] = 1\n",
    "pos_df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>text</th>\n",
       "      <th>pos</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>giseye oynayan bir film.mel gibson'in oyunculu...</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>bircok yonden sahip olduklari zayifliklari pop...</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1995 ten bu yana bu tür filmler artti , o zama...</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>mel gibson tam bir ingiliz düsmani her filmind...</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>milliyetçi bir film tavsiye etmiyorum.... \\n</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                text  pos\n",
       "0  giseye oynayan bir film.mel gibson'in oyunculu...    0\n",
       "1  bircok yonden sahip olduklari zayifliklari pop...    0\n",
       "2  1995 ten bu yana bu tür filmler artti , o zama...    0\n",
       "3  mel gibson tam bir ingiliz düsmani her filmind...    0\n",
       "4       milliyetçi bir film tavsiye etmiyorum.... \\n    0"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "neg = (path_clas/'tr_polarity.neg').open(encoding='iso-8859-9').readlines()\n",
    "neg_df = pd.DataFrame({'text':neg})\n",
    "neg_df['pos'] = 0\n",
    "neg_df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.concat([pos_df,neg_df], sort=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [],
   "source": [
    "data_lm = (TextList.from_df(df, path_clas, cols='text', processor=SPProcessor.load(dest))\n",
    "    .split_by_rand_pct(0.1, seed=42)\n",
    "    .label_for_lm()           \n",
    "    .databunch(bs=bs, num_workers=1))\n",
    "\n",
    "data_lm.save(f'{lang}_clas_databunch')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [],
   "source": [
    "data_lm = load_data(path_clas, f'{lang}_clas_databunch', bs=bs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>idx</th>\n",
       "      <th>text</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>▁bile ▁sinema ▁olgusu nun ▁en ▁üst ▁noktalar in dan . . ▁xxbos ▁gerçekten ▁tarihi ▁sava s ▁filmleri ▁ara si nda ▁tar tis ma siz ▁en ▁iyi si ▁ , ▁12 ▁ yi l ▁boyunca ▁ac aba ▁ikincisi ▁çek ir imi ▁diye ▁bekledi gi m ▁bir ▁film ▁ , bel ki ▁william ▁wallace ▁baba sinin ▁ölümünden ▁sonra ▁amca si ▁yani na ▁al m isti ▁onu ▁ ye tis tir m isti</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>ka dra j lar i yla ▁konusu yla ▁is le ni siyle ▁insana ▁ iste ▁film ▁böyle ▁çekilir ▁de dir ten ▁kusur suz ▁bir ▁film . ▁xxbos ▁böyle ▁güzel ▁bir ▁yap it ▁olamaz ▁filmde ▁her ▁sey ▁var ▁insani ▁dünya dan ▁ali p ▁gö tür üyor ▁bask a ▁diyar lara ▁film ▁bitti kten ▁sonra ▁epey ▁süre ▁geçmesi ▁gerekiyor ▁tekrar ▁dünya ▁ya ▁dönmek ▁için ▁dikkat ! . ▁xxbos ▁inan ir mi siniz</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>▁o ▁kadar ▁etki len mis tim ▁ki . . . özellikle ▁özgürlük ▁ ug ru na ▁sava san ▁william ▁wallace  in ▁is ken ce ▁edilerek ▁idam ▁edilmesi . . . ve ▁sonunda ▁özgürlük ▁diye ▁hay kir isi . . . ha lan ▁unut ami yorum ▁xxrep ▁4 ▁ . ▁xxbos ▁ilk ▁bu ▁filmi ▁sinema da ▁izledi m ▁ve ▁insan in ▁inan di ktan ▁sonra ▁ ne leri ▁yap a</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>▁için ▁ister ▁istemez ▁oraya ▁gö tür üyor ▁filmin ▁uzun lu gun a ▁al dan ip ta ▁filmi ▁izlemek ten ▁vazgeçme yin ▁xxrep ▁4 ▁ . ▁xxbos ▁harika ▁bir ▁film di ▁xxrep ▁5 ▁ . ▁xxbos ▁mükemmel ▁ötesi . . ▁ . ▁xxbos ▁hiç ▁ a bart mi yorum ▁hayat im da ▁izledi gi m ▁en ▁iyi ▁film lerden ▁biri ▁diye bilir im . tam ▁bir ▁bas yap it ▁nite ligi</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>les iyor . ▁herkes ▁izleme li . . . ▁xxbos ▁tarantino nun ▁en ▁iyi ▁filmlerinden ▁biri . ▁diyalog lar ▁çok ▁iyi . ▁kesinlikle ▁izlenmesi ▁gereken ▁bir ▁film . . ▁xxbos ▁tarantino nun ▁bu ▁filmi ▁kendini ▁belli ▁etti r iyor . hat ta ▁ben ce ▁tarantino nun ▁en ▁iyi ▁filmidir . kendi ne ▁has ▁anlat imi ▁ile ▁bu ▁film ▁hak ka ten ▁sinema ▁sever lerin ▁izleme si ▁gereken ▁bir ▁film .</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "data_lm.show_batch()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [],
   "source": [
    "learn_lm = language_model_learner(data_lm, AWD_LSTM, pretrained_fnames=lm_fns, drop_mult=1.0, wd=0.1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [],
   "source": [
    "lr = 1e-3\n",
    "lr *= bs/48"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: left;\">\n",
       "      <th>epoch</th>\n",
       "      <th>train_loss</th>\n",
       "      <th>valid_loss</th>\n",
       "      <th>time</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>4.762205</td>\n",
       "      <td>4.132825</td>\n",
       "      <td>00:07</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "learn_lm.fit_one_cycle(1, lr*10, moms=(0.8,0.7))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: left;\">\n",
       "      <th>epoch</th>\n",
       "      <th>train_loss</th>\n",
       "      <th>valid_loss</th>\n",
       "      <th>time</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>4.136730</td>\n",
       "      <td>3.976010</td>\n",
       "      <td>00:08</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>3.991336</td>\n",
       "      <td>3.882426</td>\n",
       "      <td>00:08</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>3.768218</td>\n",
       "      <td>3.812365</td>\n",
       "      <td>00:08</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>3.526307</td>\n",
       "      <td>3.795765</td>\n",
       "      <td>00:08</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>3.310204</td>\n",
       "      <td>3.815228</td>\n",
       "      <td>00:08</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "learn_lm.unfreeze()\n",
    "learn_lm.fit_one_cycle(5, slice(lr/10,lr*10), moms=(0.8,0.7))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [],
   "source": [
    "learn_lm.save(f'{lang}fine_tuned')\n",
    "learn_lm.save_encoder(f'{lang}fine_tuned_enc')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Classifier"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [],
   "source": [
    "data_clas = (TextList.from_df(df, path_clas, cols='text', processor=SPProcessor.load(dest))\n",
    "    .split_by_rand_pct(0.1, seed=42)\n",
    "    .label_from_df(cols='pos')\n",
    "    .databunch(bs=bs, num_workers=1))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [],
   "source": [
    "learn_c = text_classifier_learner(data_clas, AWD_LSTM, drop_mult=0.5, pretrained=False, wd=0.1).to_fp16()\n",
    "learn_c.load_encoder(f'{lang}fine_tuned_enc')\n",
    "learn_c.freeze()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [],
   "source": [
    "lr=2e-2\n",
    "lr *= bs/48"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: left;\">\n",
       "      <th>epoch</th>\n",
       "      <th>train_loss</th>\n",
       "      <th>valid_loss</th>\n",
       "      <th>accuracy</th>\n",
       "      <th>time</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>0.456195</td>\n",
       "      <td>0.441739</td>\n",
       "      <td>0.817073</td>\n",
       "      <td>00:02</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>0.416383</td>\n",
       "      <td>0.359390</td>\n",
       "      <td>0.841463</td>\n",
       "      <td>00:02</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "learn_c.fit_one_cycle(2, lr, moms=(0.8,0.7))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: left;\">\n",
       "      <th>epoch</th>\n",
       "      <th>train_loss</th>\n",
       "      <th>valid_loss</th>\n",
       "      <th>accuracy</th>\n",
       "      <th>time</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>0.413642</td>\n",
       "      <td>0.381773</td>\n",
       "      <td>0.828330</td>\n",
       "      <td>00:02</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>0.407912</td>\n",
       "      <td>0.358319</td>\n",
       "      <td>0.846154</td>\n",
       "      <td>00:02</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "learn_c.fit_one_cycle(2, lr, moms=(0.8,0.7))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: left;\">\n",
       "      <th>epoch</th>\n",
       "      <th>train_loss</th>\n",
       "      <th>valid_loss</th>\n",
       "      <th>accuracy</th>\n",
       "      <th>time</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>0.358337</td>\n",
       "      <td>0.310041</td>\n",
       "      <td>0.869606</td>\n",
       "      <td>00:02</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>0.279921</td>\n",
       "      <td>0.276278</td>\n",
       "      <td>0.892120</td>\n",
       "      <td>00:02</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "learn_c.freeze_to(-2)\n",
    "learn_c.fit_one_cycle(2, slice(lr/(2.6**4),lr), moms=(0.8,0.7))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: left;\">\n",
       "      <th>epoch</th>\n",
       "      <th>train_loss</th>\n",
       "      <th>valid_loss</th>\n",
       "      <th>accuracy</th>\n",
       "      <th>time</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>0.258410</td>\n",
       "      <td>0.261940</td>\n",
       "      <td>0.899625</td>\n",
       "      <td>00:03</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>0.183993</td>\n",
       "      <td>0.267610</td>\n",
       "      <td>0.905253</td>\n",
       "      <td>00:02</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "learn_c.freeze_to(-3)\n",
    "learn_c.fit_one_cycle(2, slice(lr/2/(2.6**4),lr/2), moms=(0.8,0.7))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "    <div>\n",
       "        <style>\n",
       "            /* Turns off some styling */\n",
       "            progress {\n",
       "                /* gets rid of default border in Firefox and Opera. */\n",
       "                border: none;\n",
       "                /* Needs to be in here for Safari polyfill so background images work as expected. */\n",
       "                background-size: auto;\n",
       "            }\n",
       "            .progress-bar-interrupted, .progress-bar-interrupted::-webkit-progress-bar {\n",
       "                background: #F44336;\n",
       "            }\n",
       "        </style>\n",
       "      <progress value='2' class='' max='4', style='width:300px; height:20px; vertical-align: middle;'></progress>\n",
       "      50.00% [2/4 00:07<00:07]\n",
       "    </div>\n",
       "    \n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: left;\">\n",
       "      <th>epoch</th>\n",
       "      <th>train_loss</th>\n",
       "      <th>valid_loss</th>\n",
       "      <th>accuracy</th>\n",
       "      <th>time</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>0.115543</td>\n",
       "      <td>0.290134</td>\n",
       "      <td>0.902439</td>\n",
       "      <td>00:03</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>0.107257</td>\n",
       "      <td>0.297127</td>\n",
       "      <td>0.911820</td>\n",
       "      <td>00:03</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table><p>\n",
       "\n",
       "    <div>\n",
       "        <style>\n",
       "            /* Turns off some styling */\n",
       "            progress {\n",
       "                /* gets rid of default border in Firefox and Opera. */\n",
       "                border: none;\n",
       "                /* Needs to be in here for Safari polyfill so background images work as expected. */\n",
       "                background-size: auto;\n",
       "            }\n",
       "            .progress-bar-interrupted, .progress-bar-interrupted::-webkit-progress-bar {\n",
       "                background: #F44336;\n",
       "            }\n",
       "        </style>\n",
       "      <progress value='17' class='' max='74', style='width:300px; height:20px; vertical-align: middle;'></progress>\n",
       "      22.97% [17/74 00:01<00:03 0.0947]\n",
       "    </div>\n",
       "    "
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "learn_c.unfreeze()\n",
    "learn_c.fit_one_cycle(4, slice(lr/10/(2.6**4),lr/10), moms=(0.8,0.7))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Accuracy in Gezici (2018), *Sentiment Analysis in Turkish* is: `75.16%`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 158,
   "metadata": {},
   "outputs": [],
   "source": [
    "learn_c.save(f'{lang}clas')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## fin"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
